Detecting sound events in basketball video archive

نویسنده

  • Dongqing Zhang
چکیده

The report proposes a method for detecting the sound events in a basketball game with focusing on detecting cheering sound. MFCC (Mel-frequency cepstral coefficient) features are used to identify the cheering sounds from speeches and other confusing sounds. The mfcc features are fed into a neural network and classified into three classes (cheering, speech, and others). To improve the MFCC-NN performance, a measure for temporal spectral variation is proposed, which is defined by LPC coefficient entropy. Normalized energy is also used to eliminate those false alarms caused by background noise. The outputs from these three channels are finally fused together and postprocessing techniques are used in order to get robust results. For other events, such as dribbling, template matching based approach is proposed. Experiments showed our methods achieved good performance for very difficult sound track. The described method can be used in basketball video content retrieval and highlight extraction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Semantic Analysis and Annotation for Basketball Video

This paper presents a new multiple-modality method for extracting semantic information from basketball video. The visual, motion, and audio information are extracted from video to first generate some low-level video segmentation and classification. Domain knowledge is further exploited for detecting interesting events in the basketball video. For video, both visual and motion prediction informa...

متن کامل

A sensor-fusion method for detecting a speaking student

In this paper, we propose a method for detecting the location of the speaker that is a target of automatic video filming in distance learning and lecture archive. It is required that a face of a speaking student is filmed in a lecture video. For this purpose, it is necessary to detect the location of a speaker. An acoustic sensor such as a microphone array is used widely to detect the location ...

متن کامل

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Detection and separation of speech segment using audio and video information fusion

In this paper, a method of detecting and separating speech events in a multiple-sound-source condition using audio and video information is proposed. For detecting speech events, sound localization using a microphone array and human tracking by stereo vision is combined by a Bayesian network. From the inference results of the Bayesian network, the information on the time and location of speech ...

متن کامل

A fusion scheme of visual and auditory modalities for event detection in sports video

In this paper, we propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001